Model compression as constrained optimization, with application to neural nets. Part II: quantization
نویسندگان
چکیده
We consider the problem of deep neural net compression by quantization: given a large, reference net, we want to quantize its real-valued weights using a codebook with K entries so that the training loss of the quantized net is minimal. The codebook can be optimally learned jointly with the net, or fixed, as for binarization or ternarization approaches. Previous work has quantized the weights of the reference net, or incorporated rounding operations in the backpropagation algorithm, but this has no guarantee of converging to a loss-optimal, quantized net. We describe a new approach based on the recently proposed framework of model compression as constrained optimization (Carreira-Perpiñán, 2017). This results in a simple iterative “learning-compression” algorithm, which alternates a step that learns a net of continuous weights with a step that quantizes (or binarizes/ternarizes) the weights, and is guaranteed to converge to local optimum of the loss for quantized nets. We develop algorithms for an adaptive codebook or a (partially) fixed codebook. The latter includes binarization, ternarization, powers-of-two and other important particular cases. We show experimentally that we can achieve much higher compression rates than previous quantization work (even using just 1 bit per weight) with negligible loss degradation.
منابع مشابه
Model compression as constrained optimization, with application to neural nets. Part I: general framework
Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. We give a general formulation of model compression as constrained optimization. This includes many types of compression: quantization, low-rank decomposition, pruning, lossless compression and others. T...
متن کاملModel compression as constrained optimization, with application to neural nets
Compressing neural nets is an active research problem, given the large size of state-ofthe-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. Firstly, we give a general formulation of model compression as constrained optimization. This makes the problem of model compression well defined and amenable to the use of modern numerical optimization ...
متن کاملOptimal Neural Net Compression via Constrained Optimization
Compressing neural nets is an active research problem, given the large size of state-of-the-art nets for tasks such as object recognition, and the computational limits imposed by mobile devices. Firstly, we give a general formulation of model compression as constrained optimization. This makes the problem of model compression well defined and amenable to the use of modern numerical optimization...
متن کاملAPPLICATION NEURAL NETWORK TO SOLVE ORDINARY DIFFERENTIAL EQUATIONS
In this paper, we introduce a hybrid approach based on neural network and optimization teqnique to solve ordinary differential equation. In proposed model we use heyperbolic secont transformation function in hiden layer of neural network part and bfgs teqnique in optimization part. In comparison with existing similar neural networks proposed model provides solutions with high accuracy. Numerica...
متن کاملExtremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM
Although deep learning models are highly effective for various tasks, such as detection and classification, the high computational cost prohibits the deployment in scenarios where either memory or computational resources are limited. In this paper, we focus on model compression and acceleration of deep models. We model a low bit quantized neural network as a constrained optimization problem. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1707.04319 شماره
صفحات -
تاریخ انتشار 2017